October 01, 2020
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(tidyverse)
library(plotly)
library(sf)
library(tigris)
library(leaflet)
library(censusapi)
Sys.setenv(CENSUS_KEY="c8aa67e4086b4b5ce3a8717f59faa9a28f611dab")
The chunk below is to read all the given data and process it so that I have a TOTALKBTU column. I sum it all up and group by CUSTOMERCLASS and DATE parameters.
years <- 2017:2020
quarters <- 1:4
types <- c("Electric", "Gas")
pge_17_20_elec_gas <- NULL
for(year in years){
for(quarter in quarters){
if ((quarter %in% 3:4) && (year %in% 2020)){
next}
for(type in types){
filename <-
paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
temp <- read_csv(filename)
if(type %in% "Gas"){
temp <- temp %>%
mutate(TOTALKBTU = 100 * TOTALTHM) %>%
select(-TOTALTHM, -AVERAGETHM)
} else {
temp <- temp %>%
mutate(TOTALKBTU = 3.412 * TOTALKWH) %>%
select(-TOTALKWH, -AVERAGEKWH)
}
pge_17_20_elec_gas <- rbind(pge_17_20_elec_gas,temp)
}
}
}
This chunk below is to set up the zipcode filter so that my data only includes relevant information from the 9 bay-area counties.
ca_counties <- counties("CA", cb = T, progress_bar = F)
bay_county_names <-
c(
"Alameda",
"Contra Costa",
"Marin",
"Napa",
"San Francisco",
"San Mateo",
"Santa Clara",
"Solano",
"Sonoma"
)
bay_counties <-
ca_counties %>%
filter(NAME %in% bay_county_names)
usa_zips <-
zctas(cb = T, progress_bar = F)
bay_zips <-
usa_zips %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(usa_zips %>% select(GEOID10)) %>%
st_as_sf()
I summarize and total the data in the 4 relevant fields (residential and commercial electricity and gas) from the bay area counties in a column named TOTALKBTU.
pge_summarize <-
pge_17_20_elec_gas %>%
mutate(
DATE = paste(YEAR, MONTH, "01", sep = "-") %>%
as.Date()) %>%
filter(CUSTOMERCLASS %in%
c(
"Elec- Residential",
"Elec- Commercial",
"Gas- Residential",
"Gas- Commercial")
) %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
group_by(CUSTOMERCLASS,
DATE) %>%
summarize(
TOTALKBTU = sum(TOTALKBTU, na.rm = T)
)
pge_summarize
## # A tibble: 169 x 3
## # Groups: CUSTOMERCLASS [5]
## CUSTOMERCLASS DATE TOTALKBTU
## <chr> <date> <dbl>
## 1 Elec- Commercial 2017-01-01 5632166741.
## 2 Elec- Commercial 2017-02-01 4687272512.
## 3 Elec- Commercial 2017-03-01 4906689831.
## 4 Elec- Commercial 2017-04-01 4800958796.
## 5 Elec- Commercial 2017-05-01 5139406247.
## 6 Elec- Commercial 2017-06-01 5223363399.
## 7 Elec- Commercial 2017-07-01 5582878599.
## 8 Elec- Commercial 2017-08-01 5786284979.
## 9 Elec- Commercial 2017-09-01 11259448674.
## 10 Elec- Commercial 2017-10-01 6488648461.
## # … with 159 more rows
pge_chart <-
pge_summarize %>%
ggplot() +
geom_bar(
aes(
x = DATE,
y = TOTALKBTU,
fill = CUSTOMERCLASS
),
stat = "identity",
position = "stack"
) +
labs(
x = "Date",
y = "KBTU",
title = "PG&E Bay Area Total Monthly Utilities Usage, 2017-2020",
fill = "Year"
)
pge_chart %>%
ggplotly() %>%
layout(
xaxis = list(fixedrange = T),
yaxis = list(fixedrange = T)
) %>%
config(displayModeBar = F)
For this section, I first made a variable to include all the relevant data from the bay area and I created a column called AVERAGEKBTU so that we can better understand the overall energy use for each zipcode.
To visualize the neighbourhoods that had the greatest electricity change due to COVID, I had to filter the data so that it included the months of COVID and the months without COVID. However, I didn’t just choose the months of January-March as PRE-COVID because the frame of reference had to be consistent. The winter months of January-March, as can be seen from the chart before, has generally higher electricity consumption due to colder temperatures (electric heaters). As a result, I chose the months of March 1st to June 1st of 2019 and set that as PRE-COVID while I chose March 1st to June 1st for this year and set that as POST-COVID. I did not choose 2018 and 2017 because I thought that those years would be too far off that people’s habits have changed since then. I also only used residential electricity usage because I wanted to visualize who would likely be directly impacted by the increase in electricity bills due to residential electricity consumption increases. One thing to note is that between the two years, the ZIPCODE results are different. The two dataframes had varying entry lengths. When I looked through the data, the older dataset had some zipcodes that weren’t used in the new one and vice versa. I could imagine this is due to the development of the area and the usage of some new zipcodes while data could have perhaps been missing for the others due to mishandling. Either way, some of the zipcodes did not match up so I omitted ones with values that were not available and I believe that cleared up all the values that were missing, thereby matching the available datasets together by zipcode.
## # A tibble: 1,068 x 3
## # Groups: ZIPCODE [267]
## ZIPCODE DATE TOTALPRE
## <chr> <date> <dbl>
## 1 94002 2019-03-01 15670057.
## 2 94002 2019-04-01 13092936.
## 3 94002 2019-05-01 13256889.
## 4 94002 2019-06-01 12921855.
## 5 94005 2019-03-01 2669122.
## 6 94005 2019-04-01 2195329.
## 7 94005 2019-05-01 2207762.
## 8 94005 2019-06-01 2089263.
## 9 94010 2019-03-01 34121552.
## 10 94010 2019-04-01 28494011.
## # … with 1,058 more rows
## # A tibble: 1,068 x 3
## # Groups: ZIPCODE [267]
## ZIPCODE DATE TOTALPOST
## <chr> <date> <dbl>
## 1 94002 2020-03-01 16308012.
## 2 94002 2020-04-01 14838170.
## 3 94002 2020-05-01 14308799.
## 4 94002 2020-06-01 13501779.
## 5 94005 2020-03-01 2777757.
## 6 94005 2020-04-01 2558167.
## 7 94005 2020-05-01 2394262.
## 8 94005 2020-06-01 2255366.
## 9 94010 2020-03-01 35412868.
## 10 94010 2020-04-01 31997446.
## # … with 1,058 more rows
To compare the two datasets, I binded one column of TOTALKBTU from one dataset with the other (and they should be the same given that the data is organized by ZIPCODE, as explained above), and created a new column called PERCENTCHANGE so that I could visualize the percent increase or decrease from 2019 to 2020. I used the percentage change formula to calculate the percent change using 2019 as the frame of reference. I then joined the geospatial data by zipcode so that I could use that to plot my data.
## Simple feature collection with 1098 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: NAD83
## # A tibble: 1,098 x 6
## # Groups: ZIPCODE [297]
## ZIPCODE DATE TOTALPRE TOTALPOST PERCENTCHANGE geometry
## <chr> <date> <dbl> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 94002 2019-03-01 1.57e7 16308012. 4.07 (((-122.3359 37.50805, -…
## 2 94002 2019-04-01 1.31e7 14838170. 13.3 (((-122.3359 37.50805, -…
## 3 94002 2019-05-01 1.33e7 14308799. 7.93 (((-122.3359 37.50805, -…
## 4 94002 2019-06-01 1.29e7 13501779. 4.49 (((-122.3359 37.50805, -…
## 5 94005 2019-03-01 2.67e6 2777757. 4.07 (((-122.442 37.69435, -1…
## 6 94005 2019-04-01 2.20e6 2558167. 16.5 (((-122.442 37.69435, -1…
## 7 94005 2019-05-01 2.21e6 2394262. 8.45 (((-122.442 37.69435, -1…
## 8 94005 2019-06-01 2.09e6 2255366. 7.95 (((-122.442 37.69435, -1…
## 9 94010 2019-03-01 3.41e7 35412868. 3.78 (((-122.4126 37.58916, -…
## 10 94010 2019-04-01 2.85e7 31997446. 12.3 (((-122.4126 37.58916, -…
## # … with 1,088 more rows
I created a map to visualize which neighbourhoods experienced the most residential electricity consumption change. I set the domain as -10:20 because I noticed that most zipcodes had an increase in electricity consumption. While there are zipcodes that lie outside this domain, leaflet treats them as NA and marks them as gray on the map. I also set the palette values so that it would reflect the scale on the domain. Decreases in residential electricity result in a grayer outlook on the map while increases in residential electricity result in a darker blue shade.
res_pal <- colorNumeric(
palette = "Blues",
domain =
-10:20
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_compare,
fillColor = ~res_pal(PERCENTCHANGE),
color = "white",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(PERCENTCHANGE),
" total percentage change in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_compare,
pal = res_pal,
values = ~-10:20,
title = "Total Percentage Change<br>in Residential Electricity<br>Usage in the months of March-June<br> between 2019 and 2020"
)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.